Bayesian Modelling Of Vowel Segment Duration For Text-to-Speech Synthesis Using Distinctive Features
نویسنده
چکیده
We apply a Bayesian belief network (BN) approach to vowel duration modelling, whereby vowel segment duration is modelled as a hybrid Bayesian network consisting of discrete and continuous nodes, with the nodes in the network representing linguistic factors that affect segment duration. Factor interaction is modelled in a concise way by causal relationships among the nodes in a directed acyclic (DAG) graph. New to the present research, we model segment identity as a set of distinctive features. The features chosen were frontness, height, length, and roundness. In addition, the BNs were augmented with the word class feature (content vs. function). We experimented with different BNs, and contrasted the results of the belief network model with those of Sums-of-Products (SoP) and classification and regression trees (CART) models. We trained and tested all three models on the same data. In terms of the RMS error and correlation coefficient, our BN model performs better than CART and SoP model.
منابع مشابه
Using bayesian belief networks for model duration in text-to-speech systems
The problems of database imbalance and factor interaction make modelling of segment duration in text-to-speech systems a challenging task. We therefore propose a probabilistic Bayesian belief network (BN) approach to tackle data sparsity and factor interaction problems. The belief network approach makes good estimations in cases of missed or incomplete data. Also, it captures factor interaction...
متن کاملThe Relationship Between Acoustic Characteristics and Personality Dimensions in Patients With Dysphonia
Objectives: Voice is influenced by personality. However, it is still questionable which acoustic features are influenced by personality traits. This study aimed to investigate the relationship between acoustic characteristics and personality dimensions. Methods: Thirty-three participants with dysphonia and 33 participants without dysphonia were recruited to take part in this cross-sectional st...
متن کاملبررسی اثر فیدبک شنوائی در تولید گفتار بعد از عمل کوکلئار ایمپلنت
The main goal of this study is to determine the auditory feedback effects in improvement of speech production process in prelingual totally deaf children who used cochlear implant prosthesis. For this reason, we recorded speech of four prelingual cochlear implant children pre and post of operation. Then we extract some static features of vowels-such as fundamental frequency, formant frequencies...
متن کاملModeling vowel duration for Japanese text-to-speech synthesis
Accurate estimation of segmental durations is crucial for naturalsounding text-to-speech (TTS) synthesis. This paper presents a model of vowel duration used in the Bell Labs Japanese TTS system. We describe the constraints on vowel devoicing, and effects of factors such as phone identity, surrounding phone identities, accentuation, syllabic structure, and phrasal position on the duration of bot...
متن کاملProsody modelling in Czech text-to-speech synthesis
This paper describes data-driven modelling of all three basic prosodic features – fundamental frequency, intensity and segmental duration – in the Czech text-to-speech system ARTIC. The fundamental frequency is generated by a model based on concatenation of automatically acquired intonational patterns. Intensity of synthesised speech is modelled by experimentally created rules which are in conf...
متن کامل